{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0",
   "metadata": {},
   "source": [
    "# 1. Generating GCG Suffixes Using Azure Machine Learning"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1",
   "metadata": {},
   "source": [
    "This notebook shows how to generate GCG suffixes using Azure Machine Learning (AML), which consists of three main steps:\n",
    "1. Connect to an Azure Machine Learning (AML) workspace.\n",
    "2. Create AML Environment with the Python dependencies.\n",
    "3. Submit a training job to AML."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2",
   "metadata": {},
   "source": [
    "## Connect to Azure Machine Learning Workspace"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3",
   "metadata": {},
   "source": [
    "The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning (AML), providing a centralized place to work with all the artifacts you create when using AML. In this section, we will connect to the workspace in which the job will be run.\n",
    "\n",
    "To connect to a workspace, we need identifier parameters - a subscription, resource group and workspace name. We will use these details in the `MLClient` from `azure.ai.ml` to get a handle to the required AML workspace. We use the [default Azure authentication](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential?view=azure-python) for this tutorial."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "romanlutz\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "\n",
    "# Enter details of your AML workspace\n",
    "subscription_id = os.environ.get(\"AZURE_SUBSCRIPTION_ID\")\n",
    "resource_group = os.environ.get(\"AZURE_RESOURCE_GROUP\")\n",
    "workspace = os.environ.get(\"AZURE_ML_WORKSPACE_NAME\")\n",
    "print(workspace)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5",
   "metadata": {},
   "outputs": [],
   "source": [
    "from azure.ai.ml import MLClient\n",
    "from azure.identity import AzureCliCredential\n",
    "\n",
    "# Get a handle to the workspace\n",
    "# For some people DefaultAzureCredential may work better than AzureCliCredential.\n",
    "ml_client = MLClient(AzureCliCredential(), subscription_id, resource_group, workspace)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6",
   "metadata": {},
   "source": [
    "## Create AML Environment"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7",
   "metadata": {
    "lines_to_next_cell": 0
   },
   "source": [
    "To install the dependencies needed to run GCG, we create an AML environment from a [Dockerfile](../../../pyrit/auxiliary_attacks/gcg/src/Dockerfile)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8",
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Environment({'arm_type': 'environment_version', 'latest_version': None, 'image': None, 'intellectual_property': None, 'is_anonymous': False, 'auto_increment_version': False, 'auto_delete_setting': None, 'name': 'pyrit', 'description': 'PyRIT environment created from a Docker context.', 'tags': {}, 'properties': {'azureml.labels': 'latest'}, 'print_as_yaml': False, 'id': '/subscriptions/db1ba766-2ca3-42c6-a19a-0f0d43134a8c/resourceGroups/romanlutz/providers/Microsoft.MachineLearningServices/workspaces/romanlutz/environments/pyrit/versions/5', 'Resource__source_path': '', 'base_path': 'c:\\\\Users\\\\Roman\\\\git\\\\PyRIT\\\\doc\\\\code\\\\auxiliary_attacks', 'creation_context': <azure.ai.ml.entities._system_data.SystemData object at 0x0000024025678AD0>, 'serialize': <msrest.serialization.Serializer object at 0x0000024025671470>, 'version': '5', 'conda_file': None, 'build': <azure.ai.ml.entities._assets.environment.BuildContext object at 0x00000240256FC410>, 'inference_config': None, 'os_type': 'Linux', 'conda_file_path': None, 'path': None, 'datastore': None, 'upload_hash': None, 'translated_conda_file': None})"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from pathlib import Path\n",
    "\n",
    "from azure.ai.ml.entities import BuildContext, Environment, JobResourceConfiguration\n",
    "\n",
    "from pyrit.common.path import HOME_PATH\n",
    "\n",
    "# Configure the AML environment with path to Dockerfile and dependencies\n",
    "env_docker_context = Environment(\n",
    "    build=BuildContext(path=Path(HOME_PATH) / \"pyrit\" / \"auxiliary_attacks\" / \"gcg\" / \"src\"),\n",
    "    name=\"pyrit\",\n",
    "    description=\"PyRIT environment created from a Docker context.\",\n",
    ")\n",
    "\n",
    "# Create or update the AML environment\n",
    "ml_client.environments.create_or_update(env_docker_context)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9",
   "metadata": {},
   "source": [
    "## Submit Training Job to AML"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "10",
   "metadata": {},
   "source": [
    "Finally, we configure the command to run the GCG algorithm. The entry file for the algorithm is [`run.py`](../../../pyrit/auxiliary_attacks/gcg/experiments/run.py), which takes several command line arguments, as shown below. We also have to specify the compute `instance_type` to run the algorithm on. In our experience, a GPU instance with at least 32GB of vRAM is required. In the example below, we use Standard_NC96ads_A100_v4.\n",
    "\n",
    "Depending on the compute instance you use, you may encounter \"out of memory\" errors. In this case, we recommend training on a smaller model or lowering `n_train_data` or `batch_size`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "11",
   "metadata": {},
   "outputs": [],
   "source": [
    "from azure.ai.ml import command\n",
    "\n",
    "# Configure the command\n",
    "job = command(\n",
    "    code=Path(HOME_PATH),\n",
    "    command=\"cd pyrit/auxiliary_attacks/gcg/experiments && python run.py --model_name ${{inputs.model_name}} --setup ${{inputs.setup}} --n_train_data ${{inputs.n_train_data}} --n_test_data ${{inputs.n_test_data}} --n_steps ${{inputs.n_steps}} --batch_size ${{inputs.batch_size}}\",\n",
    "    inputs={\n",
    "        \"model_name\": \"phi_3_mini\",\n",
    "        \"setup\": \"multiple\",\n",
    "        \"n_train_data\": 25,\n",
    "        \"n_test_data\": 0,\n",
    "        \"n_steps\": 500,\n",
    "        \"batch_size\": 256,\n",
    "    },\n",
    "    environment=f\"{env_docker_context.name}:{env_docker_context.version}\",\n",
    "    environment_variables={\"HUGGINGFACE_TOKEN\": os.environ[\"HUGGINGFACE_TOKEN\"]},\n",
    "    display_name=\"suffix_generation\",\n",
    "    description=\"Generate a suffix for attacking LLMs.\",\n",
    "    resources=JobResourceConfiguration(\n",
    "        instance_type=\"Standard_NC96ads_A100_v4\",\n",
    "        instance_count=1,\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "12",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Class AutoDeleteSettingSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.\n",
      "Class AutoDeleteConditionSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.\n",
      "Class BaseAutoDeleteSettingSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.\n",
      "Class IntellectualPropertySchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.\n",
      "Class ProtectionLevelSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.\n",
      "Class BaseIntellectualPropertySchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.\n",
      "Your file exceeds 100 MB. If you experience low speeds, latency, or broken connections, we recommend using the AzCopyv10 tool for this file transfer.\n",
      "\n",
      "Example: azcopy copy 'C:\\Users\\Roman\\git\\PyRIT' 'https://romanlutz0437468309.blob.core.windows.net/3f52e8b9-0bac-4c48-9e4a-a92e85a582c4-10s61nn9uso4b2p89xjypawyc7/PyRIT' \n",
      "\n",
      "See https://learn.microsoft.com/azure/storage/common/storage-use-azcopy-v10 for more information.\n",
      "\u001b[32mUploading PyRIT (194.65 MBs): 100%|##########| 194652493/194652493 [01:19<00:00, 2447407.71it/s] \n",
      "\u001b[39m\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Submit the command\n",
    "returned_job = ml_client.create_or_update(job)"
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "cell_metadata_filter": "-all"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
